Indexing Factors in DNA/RNA Sequences

نویسندگان

  • Tomás Flouri
  • Costas S. Iliopoulos
  • Mohammad Sohel Rahman
  • Ladislav Vagner
  • Michal Vorácek
چکیده

In this paper, we present the Truncated Generalized Suffix Automaton (TGSA) and present an efficient on-line algorithm for its construction. TGSA is a novel type of finite automaton suitable for indexing DNA and RNA sequences, where the text is degenerate i.e. contains sets of characters. TGSA indexes the so called k-factors, the factors of the degenerate text with length not exceeding a given constant k. The presented algorithm works in O(n) time, where n is the length of the input DNA/RNA sequence. The resulting TGSA has at most linear number of states with respect to the length of the text. TGSA enables us to find the list occ(u) of all occurrences of a given pattern u in degenerate text x̃ in time |u|+ |occ(u)|.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparative phylogenetic analysis of Theileria spp. by using two two "18S ribosomal RNA" and "Theileria annulata merozoite surface antigen" gene sequences

More than 185 species, strains and unclassified Theileria parasites are categorized in the Entrez Taxonomy. The accurate diagnosis and proper identification of the causative agents are important for understanding the epidemiology, prevention and appropriate treatment. This study aims to discuss the importance of two genes of Theileria annulata 18S ribosomal RNA (18S rRNA) and Theileria annulata...

متن کامل

Relation Between RNA Sequences, Structures, and Shapes via Variation Networks

Background: RNA plays key role in many aspects of biological processes and its tertiary structure is critical for its biological function. RNA secondary structure represents various significant portions of RNA tertiary structure. Since the biological function of RNA is concluded indirectly from its primary structure, it would be important to analyze the relations between the RNA sequences and t...

متن کامل

Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)

Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...

متن کامل

Determination of Hepatitis Delta Virus Genotype among HBV Carriers in Southwest of Iran

Background and Aims: HDV is a defective satellite virus and classified in genus Deltavirus. Its disease is related and limited to HBV-infected patients. Acute infection of delta agent occurs in two different patterns simultaneous infection with both HBV & HDV or super infection of chronically HBV infected patients that lead to more sever type of hepatitis. According to genetic diversity of geno...

متن کامل

An Enterovirus-Like RNA Construct for Colon Cancer Suicide Gene Therapy

Background: In gene therapy, the use of RNA molecules as therapeutic agents has shown advantages over plasmid DNA, including higher levels of safety. However, transient nature of RNA has been a major obstacle in application of RNA in gene therapy. Methods: Here, we used the internal ribosomal entry site of encephalomyocarditis virus and the 3’ non-translated region of Poliovirus to design an en...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008